Improving AdaBoost for Classification on Small Training Sample Sets with Active Learning

نویسندگان

  • Xuchun Li
  • Lei Wang
  • Eric Sung
چکیده

Recently, AdaBoost has been widely used in many computer vision applications and has shown promising results. However, it is also observed that its classification performance is often poor when the size of the training sample set is small. In certain situations, there may be many unlabelled samples available and labelling them is costly and time-consuming. Thus it is desirable to pick a few good samples to be labelled. The key is how. In this paper, we integrate active learning with AdaBoost to attack this problem. The principle idea is to select the next unlabelled sample base on it being at the minimum distance from the optimal AdaBoost hyperplane derived from the current set of labelled samples. We prove via version space concept that this selection strategy yields the fastest expected learning rate. Experimental results on both artificial and standard databases demonstrate the effectiveness of our proposed method.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

An anti-spam filter based on one-class IB method in small training sets

We present an approach to email filtering based on one-class Information Bottleneck (IB) method in small training sets. When themes of emails are changing continually, the available training set which is high-relevant to the current theme will be small. Hence, we further show how to estimate the learning algorithm and how to filter the spam in the small training sets. First, In order to preserv...

متن کامل

Using Validation Sets to Avoid Overfitting in AdaBoost

AdaBoost is a well known, effective technique for increasing the accuracy of learning algorithms. However, it has the potential to overfit the training set because its objective is to minimize error on the training set. We demonstrate that overfitting in AdaBoost can be alleviated in a time-efficient manner using a combination of dagging and validation sets. Half of the training set is removed ...

متن کامل

The Boosting Approach to Machine Learning An Overview

Boosting is a general method for improving the accuracy of any given learning algorithm. Focusing primarily on the AdaBoost algorithm, this chapter overviews some of the recent work on boosting including analyses of AdaBoost’s training error and generalization error; boosting’s connection to game theory and linear programming; the relationship between boosting and logistic regression; extension...

متن کامل

VipBoost: A More Accurate Boosting Algorithm

Boosting is a well-known method for improving the accuracy of many learning algorithms. In this paper, we propose a novel boosting algorithm, VipBoost (voting on boosting classifications from imputed learning sets), which first generates multiple incomplete datasets from the original dataset by randomly removing a small percentage of observed attribute values, then uses an imputer to fill in th...

متن کامل

Parameter Inference of Cost-Sensitive Boosting Algorithms

Several cost-sensitive boosting algorithms have been reported as effective methods in dealing with class imbalance problem. Misclassification costs, which reflect the different level of class identification importance, are integrated into the weight update formula of AdaBoost algorithm. Yet, it has been shown that the weight update parameter of AdaBoost is induced so as the training error can b...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2003